ETL Ensembles for Chunking, NER and SRL

نویسندگان

  • Cícero Nogueira dos Santos
  • Ruy Luiz Milidiú
  • Carlos E. M. Crestana
  • Eraldo Rezende Fernandes
چکیده

We present a new ensemble method that uses Entropy Guided Transformation Learning (ETL) as the base learner. The proposed approach, ETL Committee, combines the main ideas of Bagging and Random Subspaces. We also propose a strategy to include redundancy in transformation-based models. To evaluate the effectiveness of the ensemble method, we apply it to three Natural Language Processing tasks: Text Chunking, Named Entity Recognition and Semantic Role Labeling. Our experimental findings indicate that ETL Committee significantly outperforms single ETL models, achieving state-of-the-art competitive results. Some positive characteristics of the proposed ensemble strategy are worth to mention. First, it improves the ETL effectiveness without any additional human effort. Second, it is particularly useful when dealing with very complex tasks that use large feature sets. And finally, the resulting training and classification processes are very easy to parallelize.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phrase Chunking Using Entropy Guided Transformation Learning

Entropy Guided Transformation Learning (ETL) is a new machine learning strategy that combines the advantages of decision trees (DT) and Transformation Based Learning (TBL). In this work, we apply the ETL framework to four phrase chunking tasks: Portuguese noun phrase chunking, English base noun phrase chunking, English text chunking and Hindi text chunking. In all four tasks, ETL shows better r...

متن کامل

Entropy Guided Transformation Learning

This work presents Entropy Guided Transformation Learning (ETL), a new machine learning algorithm for classification tasks. It generalizes Transformation Based Learning (TBL) by automatically solving the TBL bottleneck: the construction of good template sets. We also present ETL Committee, an ensemble method that uses ETL as the base learner. The main advantage of ETL is its easy applicability ...

متن کامل

A Feature-Rich Vietnamese Named-Entity Recognition Model

In this paper, we present a feature-based named-entity recognition (NER) model that achieves the start-of-the-art accuracy for Vietnamese language. We combine word, word-shape features, PoS, chunk, Brown-cluster-based features, and word-embedding-based features in the Conditional Random Fields (CRF) model. We also explore the effects of word segmentation, PoS tagging, and chunking results of ma...

متن کامل

Segment-Level Neural Conditional Random Fields for Named Entity Recognition

We present Segment-level Neural CRF, which combines neural networks with a linear chain CRF for segment-level sequence modeling tasks such as named entity recognition (NER) and syntactic chunking. Our segment-level CRF can consider higher-order label dependencies compared with conventional word-level CRF. Since it is difficult to consider all possible variable length segments, our method uses s...

متن کامل

Named Entity Recognition in Tweets: An Experimental Study

People tweet more than 100 Million times daily, yielding a noisy, informal, but sometimes informative corpus of 140-character messages that mirrors the zeitgeist in an unprecedented manner. The performance of standard NLP tools is severely degraded on tweets. This paper addresses this issue by re-building the NLP pipeline beginning with part-of-speech tagging, through chunking, to named-entity ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010